Add state_dict converter for DeepSeekv3 in torchtitan #1538

wwwjn · 2025-08-06T17:33:13Z

Support loading a DeepSeek HF weights to Deepseek-V3 model:

Support split / concat weight for GroupedExperts
Support _dequantization during loading HF checkpoints

Numerical verification: (using offline conversion script)

python convert_from_hf.py /data/users/jianiw/dsv3-weights outputs/checkpoint-dsv3-cpu --model_name deepseek_v3 --model_flavor 671B > cpu_convert.txt 2>&1

torchtitan/models/deepseek_v3/model/quantization.py

wwwjn · 2025-08-11T23:25:57Z

torchtitan/experiments/multimodal/mm_dataset.py

@@ -16,12 +16,12 @@
 from tokenizer.tiktoken import BaseTokenizer, IGNORE_INDEX
 from torch.distributed.checkpoint.stateful import Stateful
 from torch.utils.data import IterableDataset
+from transform import CLIPTransform
+from utils import load_image


NOTE: This is because of I ran pre-commit

wwwjn · 2025-08-11T23:30:30Z

torchtitan/models/moe.py

@@ -282,10 +282,12 @@ def __init__(self, moe_args: MoEArgs, dim: int, hidden_dim: int):
            self.register_buffer(
                "expert_bias",
                torch.zeros(num_experts, dtype=torch.float32),
+                persistent=True,
            )


NOTE: Explicitly add whether the registered buffer is persistent. When false, we are not expected to load from DCP checkpoint.

tianyu-l

maybe need to rebase onto #1526 after it lands.

torchtitan/models/deepseek_v3/model/state_dict_adapter.py

torchtitan/models/deepseek_v3/model/quantization.py

tianyu-l

If I understand correctly, the conversion

can be used to offline convert HF checkpoint from fp8 to fp32 using CPU plain tensor.
can't be used to convert HF checkpoint on the fly using GPU DTensor, because of sharding and quantized blocks may not be aligned well.
can't be used for weight sync to generate a state dict of bf16 because fake quantization to fp8 is applied.

I think it's OK to land this PR to unblock 1, but better to explain things clearly somewhere.

I also had some inline comments.

torchtitan/models/deepseek_v3/model/state_dict_adapter.py

tianyu-l

LGTM to unblock

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Aug 6, 2025

wwwjn commented Aug 7, 2025

View reviewed changes

torchtitan/models/deepseek_v3/model/quantization.py Show resolved Hide resolved

wwwjn force-pushed the dsv3-state-dict branch from a6ebf74 to 73c05a1 Compare August 11, 2025 23:25

wwwjn changed the title ~~[WIP] Add state_dict converter for DeepSeekv3 in torchtitan~~ Add state_dict converter for DeepSeekv3 in torchtitan Aug 11, 2025

wwwjn marked this pull request as ready for review August 11, 2025 23:33

wwwjn requested review from tianyu-l, fegin and wconstab as code owners August 11, 2025 23:33

wwwjn requested review from ankitade and ebsmothers August 11, 2025 23:33

wwwjn commented Aug 11, 2025

View reviewed changes

tianyu-l reviewed Aug 12, 2025

View reviewed changes

torchtitan/models/deepseek_v3/model/state_dict_adapter.py Outdated Show resolved Hide resolved

torchtitan/models/deepseek_v3/model/quantization.py Outdated Show resolved Hide resolved

wwwjn added 12 commits August 11, 2025 23:40

add v0

135b910

add from_hf and to_hf

75e22b8

add state_dict convert

cabd916

fix value

1b2f8af

get state_dict_adapter working

cb9ce14

modulize

6ac75c9

lint

5f3c1a0

test train.py

5bfeee2

offline convert checked

94efbba

rebase

8e1748e

change wording

7009bcc

rebase

189c1f2

wwwjn force-pushed the dsv3-state-dict branch from de1ab6d to 189c1f2 Compare August 12, 2025 06:50

wwwjn requested a review from tianyu-l August 12, 2025 06:51

tianyu-l reviewed Aug 12, 2025

View reviewed changes

torchtitan/models/deepseek_v3/model/state_dict_adapter.py Outdated Show resolved Hide resolved

torchtitan/models/deepseek_v3/model/state_dict_adapter.py Outdated Show resolved Hide resolved

torchtitan/models/deepseek_v3/model/state_dict_adapter.py Outdated Show resolved Hide resolved

wwwjn added 2 commits August 12, 2025 10:34

add README

34804c2

fix the scale initialize

f9b99e0

wwwjn requested a review from tianyu-l August 12, 2025 18:10

tianyu-l approved these changes Aug 12, 2025

View reviewed changes

tianyu-l mentioned this pull request Aug 12, 2025

Move fqn mapping logic to StateDictAdapter #1557

Merged

wwwjn merged commit a6972ae into main Aug 12, 2025
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add state_dict converter for DeepSeekv3 in torchtitan #1538

Add state_dict converter for DeepSeekv3 in torchtitan #1538

Uh oh!

wwwjn commented Aug 6, 2025 •

edited

Loading

Uh oh!

Uh oh!

wwwjn Aug 11, 2025

Uh oh!

wwwjn Aug 11, 2025

Uh oh!

tianyu-l left a comment

Uh oh!

Uh oh!

Uh oh!

tianyu-l left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tianyu-l left a comment

Uh oh!

Uh oh!

Uh oh!

Add state_dict converter for DeepSeekv3 in torchtitan #1538

Add state_dict converter for DeepSeekv3 in torchtitan #1538

Uh oh!

Conversation

wwwjn commented Aug 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

wwwjn Aug 11, 2025

Choose a reason for hiding this comment

Uh oh!

wwwjn Aug 11, 2025

Choose a reason for hiding this comment

Uh oh!

tianyu-l left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

tianyu-l left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tianyu-l left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

wwwjn commented Aug 6, 2025 •

edited

Loading